-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Fixes #4388: Correct transcription_delay metric calculation in STT turn detec… #4396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #4388: Correct transcription_delay metric calculation in STT turn detec… #4396
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
davidzhao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devbyteai please remove PR_DESCRIPTION.md from the commit
ee893ae to
0e2ffb6
Compare
📝 WalkthroughWalkthroughReplaced sentinel checks for Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (3)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/voice/audio_recognition.py`:
- Around line 452-453: The condition uses `self._last_speaking_time == 0` but
`_last_speaking_time` is initialized to None and reset to None, so replace
comparisons to 0 with explicit None checks; update the three spots in
audio_recognition.py where you see `if not self._vad or self._last_speaking_time
== 0` (and similar at the other two locations) to `if not self._vad or
self._last_speaking_time is None` so START_OF_SPEECH only sets the timestamp
when it truly hasn't been set, and END_OF_SPEECH/other branches behave
correctly; ensure you update all occurrences that reference
`_last_speaking_time` in the relevant methods to use `is None`.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
livekit-agents/livekit/agents/voice/audio_recognition.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/voice/audio_recognition.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…tion mode Fixes livekit#4388 Remove the line that overwrites _last_speaking_time at END_OF_SPEECH in STT mode. This was causing transcription_delay to always be ~0 since END_OF_SPEECH typically arrives after FINAL_TRANSCRIPT, making both timestamps nearly identical.
0e2ffb6 to
0cee66e
Compare
davidzhao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg!
Summary
Fixes #4388
This PR fixes the incorrect
transcription_delaymetric calculation when using STT-based turn detection (e.g., Deepgram Flux).Problem
When using STT turn detection mode, the
transcription_delaymetric incorrectly shows ~0 seconds instead of reflecting the actual transcription latency.User-Reported Behavior:
The metric should measure the time between when the user stopped speaking and when the transcript was received, but it was always returning near-zero values.
Root Cause
In
audio_recognition.py, thetranscription_delayis calculated as:The bug was in the STT END_OF_SPEECH handler (line 452), which overwrote
_last_speaking_timewithtime.time():Event Timeline in STT Mode (Buggy):
_last_speaking_time = time.time()(correct)_last_final_transcript_time = time.time()(correct)_last_speaking_time = time.time()(BUG - overwrites!)Since END_OF_SPEECH typically arrives shortly after FINAL_TRANSCRIPT in STT mode, both timestamps become nearly identical, resulting in
transcription_delay ≈ 0.Solution
Remove the line that overwrites
_last_speaking_timeat END_OF_SPEECH in STT mode. The value was already correctly set at START_OF_SPEECH.Comparison with VAD Mode:
VAD mode does NOT update
_last_speaking_timeat END_OF_SPEECH - it keeps the value from the last INFERENCE_DONE event. STT mode should follow the same pattern.After Fix:
_last_speaking_time = time.time()(preserved)_last_final_transcript_time = time.time()Result:
transcription_delay = last_final_transcript_time - last_speaking_timenow correctly represents the actual transcription latency.Testing
All 15 existing agent session tests pass:
Backward Compatibility
No breaking changes - This fix only corrects the metric calculation. The actual agent behavior (speech recognition, turn detection, interruption handling) is completely unchanged.
Expected Impact:
transcription_delayvalues in their metricsEdge Cases Handled
_last_speaking_timefor each new segment_last_final_transcript_timecorrectlyFiles Changed
livekit-agents/livekit/agents/voice/audio_recognition.pyself._last_speaking_time = time.time()line from END_OF_SPEECH handlerRelated Issues
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.